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NEW MEMBERS OF THE GLYPICAN GENE FAMILY 

The present invention relates to the 
characterization and chromosomal localization of new 
members of the glypican gene family and to the use of 
members of this family in diagnostics and/or 
5 therapeutics. 

Glypicans are glypiated cell surface heparan 
sulfate proteoglycans, the first of which was originally 
identified in human lung fibroblasts (David et al., 
1990) . The known five members of this family have similar 

10 core protein sizes (about 60 kDa) , share a unique and 
very conserved cysteine spacing, and are linked to the 
cell membrane by a glycosyl phosphatidyl inositol (GPI) - 
anchor. Those five known glypicans are of vertebrate 
origin and include glypican (glypican- 1, David et al . , 

15 supra), Cerebroglycan (glypican-2, Stipp et al . , 1994), 
OCI-5 (glypican-3, Filmus et al . , 1995), K-glypican 
(glypican-4, Watanabe et al . , 1995) and glypican-5 
{Veugelers et al . , 1997). 

All the structural features of the vertebrate 

20 glypicans are also present in the product of dally 
(division abnormally delayed) , a locus identified in 
Drosophila melanoaaster by genetic screening for mutants 
affecting cell division patterning in the developing 
central nervous system (Nakato et al., 1995). Besides 

25 disturbing cell cycling in the nervous system dally 

mutations also affect viability and produce morphological 
defects in several adult tissues, including the eyes, 
antennae, wings and genitalia. The dally mutants, the 
well established co-receptor activities of the cell 

30 surface proteoglycans for various ligands that are known 
to mediate developmental instructions, and the tissue and 
stage-specific expressions of the glypicans, all 
implicate the glypican group of integral membrane 
proteoglycans in the control of cell division and 

35 patterning during development. This contention has 
recently been corroborated by the identification of 
mutations in GPC3 , the gene coding for the human 
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homologue of OCI-5 (glypican-3 ) , that cause the Simpson- 
Golabi-Behmel overgrowth syndrome (SGBS) {Pilia et al . , 
1996) . This X- linked condition, which clinically has to 
be differentiated from the autosomal Beckwith-Wiedeman 
5 Syndrome, is characterized by pre- and post-natal 

overgrowth with visceral and skeletal anomalies, and is 
associated with a high risk for developing embryonal 
tumors, including Wilms' tumor and neuroblastoma. 

It was therefore anticipated by the present 

10 inventors that chromosomal assignment of the genes for 

the members of the glypican family and the identification 
of potentially additional members in this family may be 
of general relevance for the understanding of somatic 
overgrowth and tumor predisposition. So far, only 

15 glypican, the homologue of OCI-5 (glypican-3) and 
glypican- 5 have been identified in human. The 
corresponding genes GPC1 , GPC3 and GPC5 have been 
localized to chromosomes 2q35-q37, Xq2 6 and 13q32, 
respectively (Vermeesch et al., 1995; Pilia et al., 1996; 

20 Veugelers et al . , 1997). The cDNA nucleotide and derived 
amino acid sequences of these genes are given in figures 
3, 4 and 5, respectively. 

It is thus the object of the present invention 
to provide new members of the glypican family, and to 

25 study their possible implications in various medical 

indications. It is a further object of the invention to 
use the information derivable from the members of the 
glypican gene family for designing diagnostic methods and 
kits and/or for the development of therapeutics. 

30 In the research that led to the present 

invention two novel human cDNAs were identified encoding 
glypican- related proteins. The corresponding gene for the 
first was mapped to chromosome 13q32 . In this application 
this gene will be identified as GPC6 , whereas the protein 

35 encoded by the gene will be called glypican-6. The 

predicted primary structure of the GPC6 protein was found 
to show significant sequence similarity to glypican 
(glypican- 1) , to the human homologue of OCI-5 
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(glypican-3) , to glypican-5, to the glypican-related 
proteins Cerebroglycan (glypican-2 of the rat) , to 
K-glypican {glypican-4 of the mouse) and to the gene 
product of the daily locus in Drosophila melanoaaster . 
5 The similarity pertains to a conserved sequence motif, 
present in all seven proteins and that include a set of 
14 conserved cysteine residues found in specific 
positions. Additional similarities include the overall 
sizes of the proteins, the presence of N- terminal and 

10 C-terminal signal peptide- like sequences (the first 

predicted to be involved in the membrane translocation of 
the nascent polypeptide, the second in the temporary 
membrane anchorage and subsequent glypiation of the 
proteins) , and the presence of glycosaminoglycan 

15 attachment consensus sequences close to the C-termini of 
the proteins. 

Glypican-6 is, however, more similar to K- 
glypican and human glypican-4 (see below) than to the 
other glypicans. As to these other glypicans, glypican-6 

20 is more similar to glypican-1 and glypican-2 than to 
glypican-3 and glypican-5. The gene encoding glypican-6 
( GPC6 ) has a similar exon-intron organization as the gene 
encoding glypican-4 ( GPC4 as was now found according to 
the invention) and the gene encoding glypican-1 ( GPC1 ) . 

25 This organization differs from the exon-intron 

organization of the gene encoding glypican-3 (GPC3) and 
that of the gene encoding glypican-5 (GPC5) , while GPC3 
and GPC5, in turn, resemble one another in terms of their 
intron-exon organizations / This indicates the possible 

30 existence of (at least two) glypican subfamilies: one 
comprising, so far, the glypicans 1, 2, 4 and 6; the 
other comprising, so far, the glypicans 3 and 5. 

According to a further aspect of the invention, 
a cDNA is provided that encodes the human homologue of K- 

35 glypican (glypican-4) , the corresponding gene of which 
localizes to chromosome Xq26 in very close proximity to 
the gene for glypican-3. Thus the GPC3 and GPC4 genes are 
adjacent, or near adjacent, to one another on chromosome 



WO 99/37764 PCT/EP99/00329 

4 

Xq26, while the GPC5 and GPC6 genes are adjacent, or near 
adjacent, to one another on chromosome 13q32. In these 
two examples a member ( GPC3 . respectively GPC5) of one of 
the glypican subfamilies is physically linked to a member 
5 (GPC4 , respectively GPC6 ) of another glypican subfamily. 
This indicates that the glypican subfamilies and various 
members of these families may have arisen from the 
duplications of one ancestral glypican gene and ancestral 
gene cluster. Maintenance of the physical associations 

10 between these genes during evolution suggests that these 
genes may also be functionally linked. 

Furthermore, it was found according to the 
invention that besides the gene for glypican- 3 (GPC3) 
also the gene for glypican-4 (GPC4) may show aberrations 

15 in patients having disorders and diseases involving 
abnormal cell growth and behavior, like somatic 
overgrowth and tumor formation. With this information 
various diagnostic tests could be developed in order to 
detect aberrations in the genes that encode glypicans and 

2 0 aberrations in the expression levels of these genes. 
Moreover, this knowledge can be used to develop 
therapeutic compounds that restore the physical damage 
caused by the mutant gene. 

The aberrations in the gene comprise for 

25 example deletions or translocations within either or both 
of the two genes, but also mutations in either or both of 
them. These aberrations may lead to the absence of gene 
products or to abnormal gene products. Thus, the 
expression level of the gene may be used as another 

30 parameter indicating the presence of one or more aberrant 
genes . 

In the research that led to the invention 
various aberrations have been found in patients suffering 
from Simpson- Golabi-Behmel syndrome, including deletion 
35 of the entire GPC4 gene and exons 7 and 8 of GPC3 . 

deletion of exons X and 2 of GPC3 . a T>A mutation in exon 
3 of GPC3 leading to a W 296 >R substitution in glypican 3, a 
C>T mutation in exon 8 of GPC4 leading to a A 442 >V 
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substitution in glypican 4, deletion of one dT nucleotide 
in GPC3 resulting in a frame shift mutation and premature 
termination of the protein. 

The finding for GPC3 and GPC4 can be 
5 extrapolated to other members of the glypican gene 
family. GPC5 and GPC6 are likewise associated. 
Aberrations in these genes can be identified in a similar 
manner as herein described for GPC3 and GPC4 . 

Various molecular biological techniques can be 

10 used to find these types of aberration in the genes. 
These techniques include but are not limited to single 
strand conformation polymorphism (SSCP) screening, 
restriction fragment length polymorphism (RFLP) 
screening, gel electrophoresis, Southern blot analysis, 

15 PCR, DNA sequencing, etc. 

Diagnostic methods according to the invention 
are based on the information derivable from the gene 
and/or its gene product. Such information comprises the 
nucleotide sequence, either sense or antisense, of the 

20 gene and the complementary strand thereof, and the amino 
acid sequence of the gene product encoded by the coding 
sequence of the gene. The information derivable from the 
gene or gene product can very well be defined by a person 
skilled in the art by referring to figures 1 to 6 . 

25 Figures 1 to 5 disclose the nucleotide sequence of the 
human cDNAs for glypicans 1 and 3 to 6, as well as the 
derived amino acid sequence of the protein encoded by the 
cDNA. Figure 6 gives an alignment of the predicted amino 
acid sequences and the position of the exon boundaries 

30 for each of them. This information can be used to define 
so-called derivatives. 

Derivatives of the nucleotide sequence of the. 
gene are the gene itself, either isolated or synthetic, 
fragments of the gene, either isolated or synthetic and 

35 having a length that is smaller than the complete gene; 
primers, comprising at least 10 consecutive gene specific 
nucleotides, preferably about 20 gene specific 
consecutive nucleotides of the nucleotide sequence of the 
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gene; longer oligonucleotides up to the full length of 
the sequence of the gene; antisense variants of the gene, 
the fragments or the primers; antibodies directed to the 
gene, fragments, primers or complementary strands 
5 thereof; any specific ligand for DNA that can be used as 
a specific probe, peptide nucleic acid probes . 

Other derivatives are transcripts (mRNA 
sequences) of the gene, from which in turn cDNA, 
antisense RNA, antisense cDNA, antibodies directed to the 

10 transcript, sense and antisense cDNA, antisense RNA and 
any specific ligand for RNA that can be used as a 
specific probe can be derived. 

Derivatives of the amino acid sequence include 
the isolated or synthetic gene product (also called 

15 protein or polypeptide) ; isolated or synthetic peptides, 
comprising a specific sequence of consecutive amino acids 
encoded by the gene, antibodies directed to the gene 
product or peptides and any specific ligand for peptides 
that can be used as a specific probe. 

20 Other derivatives are heparan sulfate chains or 

heparan sulfate structures, antibodies directed to 
heparan sulfate structures present on the product of the 
natural or synthetic gene as a result of the 
posttranslational modification of these gene products, 

25 any specific ligand for heparan sulfate that can be used 
as a specific probe. 

Furthermore, the gene or cDNA may be used for 
the transfection of cells, which transfection results in 
cells expressing or secreting the desired glypican. The 

3 0 transfected cells can be used to produce transgenic 

animals therefrom, which in case the gene is an aberrant 
gene, can be used to study the effect of the aberration 
or to test medicaments. Alternatively, natural glypicans 
may be isolated or recombinant (wild type or mutated} 

35 glypicans produced (in transfected cells or transgenic 
animals) for use as therapeuticals . Such therapeutical s 
may be used to mimic the biological effects of the 
glypicans (control of cell growth and differentiation) , 
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in attempts to remedy the effects of absolute or relative 
deficiencies of these genes or to enhance the effects of 
the normal genes. With the appropriate modifications of 
the glypican gene sequences, therapeuticals based on 
5 modified glypican gene sequences may also be used to 

block the effects of the glypicans, in attempts to remedy 
the effects of absolute or relative overexpressions or 
activities of the products of the various glypican genes. 

As a non-limiting example, recombinant soluble 

10 glypicans may be used as decoy receptors for antagonizing 
the effects of factors that depend on membrane -anchored 
glypicans, whereas the delivery of membrane - 
intercalatable glypicans to cells may restore cellular 
sensitivity to these factors . 

15 Diagnostic methods according to the invention 

comprise but are not limited to the following. 

A method for diagnosing aberrations in a 
glypican encoding gene, comprises isolation of the gene 
from cells expected to be harboring an aberrant gene; and 

20 comparison of the nucleotide sequence of the gene thus 
obtained with the nucleotide sequence of a wild type 
gene. The term "wild type gene" as used in this 
application is intended to encompass a gene from a non- 
affected individual. The sequences given in figures 1 to 

25 5 are representatives for wild type gene sequences. 

Comparison between the potentially aberrant and 
wild type nucleotide sequences can be performed at 
various levels. On a first level it can be established 
whether the expected aberration (s) has (have) resulted in 

3 0 restriction fragment length polymorphism. In order to do 
this the isolated gene and a wild type comparison gene 
are separately digested with one or more selected 
restriction enzymes. The digest thus obtained is 
separated on a gel revealing a pattern of bands. 

35 Differences in the pattern indicate the presence of 
differences in the restriction sites present in the 
polynucleotide and thus changes in the sequence thereof. 
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Deletions can be detected by means of any 
nucleic acid amplification technique such as the 
Polymerase Chain Reaction (PCR) . For this, probes are 
identified corresponding to various parts of the gene to 
5 be diagnosed, for example exons. Amplification between a 
set of probes will only occur if the part of the gene to 
which a selected set of probes should hybridize is still 
present. In addition or as an alternative the length of 
the amplified fragment indicates whether any part is 
10 deleted. 

Point mutations can be identified by more 
sophisticated techniques such as SSCP (single -strand 
conformation polymorphism screening) , heteroduplex 
analyses, DNA-chips, chemical and enzymatic methods, 

15 sequencing of PCR products, denaturant gradient gel 

electrophoresis or other state of the art methods that 
may become available in the future. 

Other diagnostic methods comprise the in situ 
detection of physical changes like translocations, 

20 inversions or deletions. Translocations can be detected 
by hybridizing a set of chromosomes with a first probe 
that hybridizes to a part of the glypican gene that is 
not likely to be involved in the translocation, inversion 
or deletion and a second probe that hybridizes to a part 

25 of the glypican gene that is likely to be involved in 
such aberration. When a translocation has occurred the 
second probe will be found on another chromosome than the 
first one. If the probable translocation partner is 
identified, an additional set of probes can be used which 

30 hybridize to the translocated part and the remaining 
part, respectively of the translocation partner and 
bearing a different label from the first set of probes. 
Upon translocation one of the probes of the first set 
will be found on the chromosome of the translocation 

35 partner and one probe of the second set will be found on 
the chromosome of the glypican gene and vice versa. 

Identifying inversions and deletions works in a 
similar way with two probes, one that hybridizes to a 
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part of the gene that is not likely to be involved in the 
inversion or deletion and a second probe that is likely 
to be involved in such aberration. In case of an 
inversion the second probe will be found closer to or 
5 further away from the first probe than in a non-aberrant 
chromosome. In the case of deletions one of the probes 
will be missing on the aberrant chromosome. The 
description and examples of this application give various 
examples of "parts of the gene that are likely to be 

10 involved in an aberration" . 

Diagnosis can also be performed at the level of 
the (potentially absent or aberrant) protein encoded by 
the glypican gene. Antibodies directed to the gene 
product or protein can be used on Western blots to detect 

15 the presence of the protein in the cell or to assess the 
amount of protein present . 

The diagnostic tests of the invention can be 
performed on various source materials. RFLP, deletion 
PCR, SSCP and chromosome analyses are for example 

20 performed on blood cells or tissue biopsy samples of the 
patient and his or her family. Furthermore , tumor cells 
and normal cells of these subjects may be used. For 
protein analysis, tissue samples, sera, tissue fluids of 
patients and family, pleura exudates, ascites etc, may be 

25 used. 

All these diagnostic methods are based on the 
information that can be derived from the various genes 
and the gene products thereof . This information is given 
in the following figures. 
30 Figure 1 shows the nucleotide sequence of the 

glypican-6 cDNA, comprising the coding sequence of the 
newly identified GPC6 gene, and the predicted amino acid 
sequence . 

Figure 2 shows the nucleotide sequence of the 
35 glypican-4 cDNA, comprising the coding sequence of the 
GPC4 gene, and the predicted amino acid sequence. 

Figure 3 shows the nucleotide sequence of the 
human glypican- 1 cDNA, comprising the coding sequence of 
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the GPC1 gene, and the predicted amino acid sequence 
(Genbank accession number X54232; David et al . , J. Cell 
Biol. Ill, 3165-3176, 1990). 

Figure 4 shows the nucleotide sequence of the 
5 human glypican-3 cDNA, comprising the coding sequence of 
the GPC3 gene, and the predicted amino acid sequence 
(Genbank accession number 237987) . 

Figure 5 shows the nucleotide sequence of the 
human glypican-5 cDNA, comprising the coding sequence of 
10 the GPC5 gene, and the predicted amino acid sequence 
(Genbank accession number U66033; Veugelers et al . 
Genomics 40, 24-30, 1997) . 

Figure 6 shows the alignment of the predicted 
amino acid sequences of the members of the glypican 
15 family. GPC1 is human glypican (David et al . , 1990), GPC3 
is the translated ORF of MXR7 (GenBank #Z37987) and the 
human homologue of rat OCI-5, GPC4 is human glypican- 4 
(see example 2) and the human homologue of K-glypican 
(Watanabe et al., 1995), GPC5 is human glypican-5 
20 (Veugelers et al . , 1997). The set of fourteen conserved 
cysteines and the putative glycosaminoglycan attachment 
sites are outlined by underlining. Serines occurring in 
SGXG sequence contexts are indicated in bold. Alignment 
was done using the Clustal V program (Higgins, 1994) , 
25 Single carets under the sequences indicate exon-intron 
boundaries occurring within codons; double carets 
indicate exon-intron boundaries occurring between codons. 

In the following examples reference is made to 
the following additional figures. 
30 Figure 7A shows a Northern blot for GPC6 of 

human fetal Brain (lane 1), Lung (lane 2), liver (lane 3) 
and Kidney (lane 4) RNA. The positions of RNA size 
markers (kb) are indicated in the abscissa. 

Figure 7B shows a Northern blot for GPC6 of 
35 human adult Heart (lane 1), Brain (lane 2), Placenta 

(lane 3), Lung (lane 4), Liver (lane 5), Skeletal Muscle 
(lane 6), Kidney (lane 7) and Pancreas (lane 8) RNA. 
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Figure 7C shows a Northern blot for QPC6 of 
human adult Spleen (lane 1} , Thymus {lane 2) , Prostate 
(lane 3), Testis (lane 4), Ovary (lane 5), Small 
Intestine (lane 6) , Colon Mucosal lining (lane 7) , and 
5 Peripheral Blood Leukocyte (lane 8) RNA. The positions of 
RNA size markers (kb) are indicated in the abscissa. 

Figure 8 shows heparan sulfate core protein 
expression in control and GPC-transf ectant Namalwa cells. 
Western blotting of non digested, heparitinase -digested, 

10 doubly heparitinase- and chondroitinase ABC- digested, and 
chondroitinase ABC-digested proteoglycan fractions, using 
the delta-HS-specif ic tnAb 3G10 . This antibody reacts with 
the desaturated uronates that are generated by 
heparitinase and that remain in association with the core 

15 protein after the enzyme treatment and during 

electrophoresis . After heparitinase it therefore detects 
all heparan sulfate proteoglycan core proteins present in 
the sample . The positions of protein size markers are 
indicated in the abscissa. Hase : heparitinase; Case: 

20 chondroitinase ABC. Control: wild type Namalwa cells (wt) 
and Namalwa cells, transfected with pREP4 0; GPC1, GPC4 
and GPC6: Namalwa cells, transfected with respectively 
the plasmids glypl~pREP4, glyp4-pREP4 and glyp6-pREP4. 

Figure 9 shows heparan sulfate expression in 

25 control and GPC-transf ectant Namalwa cells. FACS analysis 
of non-digested and heparitinase -digested cells, using 
the native HS- specific 10E4 antibody (non digested cells 
(-•-•-)) and delta-HS-specif ic 3G10 antibody (digested 
cells ( ). Control: Namalwa cells, transfected with 

30 pREP4; GPC1 , GPC3, GPC4 , GPC5 and GPC6 : Namalwa cells, 
transfected with respectively the plasmids glypl-pREP4, 
glyp3-pREP4, glyp4-pREP4, glyp5-pREP4 and glyp6-pREP4. 

Figure 10 shows the chromosomal localization of 
GPC6 to chromosome band 13Q32 as a photo (A) and a 

35 schematic representation thereof (B) . Arrows indicate 
colored bands. 

Figure 11A shows the Northern blot for GPC4 of 
human fetal Kidney (lane 1}, Liver (lane 2), Lung (lane 
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3) and Brain (lane 4) RNA. The upper part of the figures 
represents the hybridization with the GPC4 probe; the 
lower part the hybridization with a S-actin control 
probe . 

5 Figure 11B shows the Northern blot for GPC4 of 

human adult Heart (lane 1) , Brain (lane 2) , Placenta 
(lane 3) , Lung (lane 4), Liver (lane 5), Skeletal Muscle 
(lane 6 J , Kidney (lane 7) and Pancreas (lane 8) RNA. 

Figure 11C shows the Northern blot for GPC4 of 

10 human adult Spleen (lane 1> , Thymus (lane 2) , Prostate 
(lane 3) , Testis (lane 4) , Ovary (lane 5) , Small 
intestine (lane 6) , Colon; mucosal lining (lane 7) , 
peripheral blood leukocyte (lane 8) . The positions of RNA 
size markers (kb) are indicated in the abscissa. 

15 Figure 12A illustrates the chromosomal 

localization of GPC4 to chromosome Xq26 and relative 
order of GPC3 and GPC4 . For initial chromosomal 
localization of GPC4 . FISH was performed using either BAC 
35H9 or BAC 68G14 on metaphase spreads, prepared from 

20 PHA- stimulated normal peripheral blood leukocytes (Figure 
12 A) . For relative ordering of GPC genes FISH was 
performed with BAC's for GPC4 (3 5H9, 68G14 labeled in 
red) and BAC's for GPC3 (166D10 and 36D20, labeled in 
green) on PHA- stimulated cell lines GM3884, GM13034 and 

25 GM0097 (Figures 12B, 12C and 12D) . Chromosomes were 

counters t a ined with DAPI, and the images were taken using 
a cooled CCD device. Arrows indicate the positive signals 
at chromosome Xq26. 

Figure 13A and 13B show a BAC/PAC contig 

30 linking GPC4 to GPC3 on Xq26. Figure 13C shows glypican 
deletions found in SGBS -pat lent s , STS ' s are indicated by 
black circles; exons are indicated by grey squares. Not 
drawn to scale (The distance between SWXP1698 and exon-8 
of GPC3 is approximately 250 kb {See Pilia et al . , 1996) . 

35 The following tables that precede the 

references give inter alia suitable primers for use in 
the various diagnostic methods: 
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Table 1 shows the primers used in 5 ' -RACE 
experiments for the identification of GPC6 . 

Table 2 shows the percentages of amino acid 
identities between glypicans. 
5 Table 3 shows the primers used in the RACE 

experiments with GPC4 . 

Table 4 shows the gene specific primers used 
for sequencing of the GPC4 gene. 

Table 5 shows the novel STSs MV1, MV2 and MV3 . 
10 Table 6 shows localization of FISH signals in 

SGBS patients. 

Table 7 shows the intron-exon organization of 

GPC4 . 

Table 8 shows primers to be used in deletion 
15 analysis of GPC3 and GPC4 . 

Table 9 shows the primers for use in SSCA of 

GPC4 . 

Table 10 shows the results of deletion and SSCP 
screening in 8 patients with SGBS. 
20 Table 11 shows primer pairs for deletion PCR of 

GPC5 . 

Table 12 shows primer pairs for deletion PCR of 

GPC6. 

The present invention will be further 
25 illustrated in the accompanying examples that are in no 
way intended as a limitation of the present invention. 



EXAMPLES 
30 EXAMPLE 1 

Identification and characterization of alvpican-6 
1 . 1 Introduction 

The isolation of one of the cDNAs of the 
present invention (GPC6) started from EST database 
35 entries showing significant similarity with (cDNA coding 
for) glypicans. The cDNA was found in a cDNA library of 
fetal brain. 
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1.2 Materials and methods 

1.2.1 Bioinf ormatics 

EST entries (including homology data) were 
retrieved from dbEST using a text string based query 
5 interface (http://www.ncbi.nlm.nih.gov/dbEST/index.html) . 
Protein alignments were made using the program Clustal . 
DNA alignments were made using the program GENEPRO . 

1.2.2 Molecular cloning of human GPC6 

10 The primer sets used for the 5 » -RACEs (5 » rapid 

amplification of cDNA ends) , corresponding to two GPC - 
like ESTs (GenBank Accession Nos . N87558 and AA001322), 
are given in Table 1 . The cDNAs were amplified from a 
library of adaptor- ligated double strand human fetal 

15 brain cDNA (Clontech, Palo Alto, CA) through a two-step 
PCR protocol. In the first PCR a gene-specific primer was 
used and an anchor primer provided by the supplier. Then 
1 fil of each first PCR was used as template for a second 
PCR, using a second gene-specific nested primer (cf . 

20 Table 1) and a nested anchor primer provided by the 

supplier. The products of the second steps were analyzed 
by electrophoresis in a 0.6% agarose gel. Distinct bands 
were gel purified using the QIQUICK II DNA clean-up 
System (Qiagen) , T/A-cloned in the plasmid pCR2 . 1 

25 (Invitrogen) , and sequenced using a Pharmacia A.L.F. DNA 
Sequencer, with Dye Primer Cycle Sequencing chemistry on 
double stranded plasmid templates. In total, 5 
independent clones from two separate 5 1 -RACE experiments 
(2 from the 5 1 -RACE-1 and 3 from the 5'-RACE-2 

30 experiment) were sequenced. 

Clone zh83a06 from the Soares fetal 
liver/ spleen library (Lennon et al . , 1996), which had 
yielded EST No. AA001322, was obtained from the 
I.M.A.G.E. Consortium 

35 (http : / /www- bio . llnl . gov/bbrp/ image /image . html) through 
Research Genetics, Inc. (Huntsville, AL) . This clone (ID: 
427858) was completely sequenced, yielding residues 1835- 
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274 8 of the composite cDNA sequence that is shown in 
figure 1. 

1.2.3 BAC cloning and chromosomal localization of GPC6 by 
5 FISH 

To isolate genomic clones for GPC6 , part of the 
cDNA insert from I .M. A.G.E. -clone 4278S8 was isolated 
with EcoRI/PstI, gel purified, labeled and used to screen 
a human genomic BAC library {Research Genetics, Inc., 

10 Huntsville, AL) . Two BACs, 114AI7 and 182F5, were 

isolated. Their authenticity was verified by Southern 
blotting. These results indicated that the BACs 114A17 
and 182P5 contain exons 6-9 of GPC6 . BAC DNA was labeled 
with bio- 16-dUTP (Sigma) by nick- translation using a 

15 commercial kit (Life Technologies, Gaithersburg , MD) . 

Metaphase spreads were prepared from PHA- 
stimulated human peripheral blood lymphocytes cultured 
for 72 hours. Prior to FISH, slides were treated with 
RNAse A and pepsin as described (Wiegant et al . , 1991) . 

20 Human Cotl DNA (Life Technologies) was used as a 
competitor. Denaturation of the slides and probes, 
hybridization, and subsequent cytochemical detection of 
the hybridization signals were performed as previously 
described (Vermeesch et al . , 1995). Chromosomes were 

25 counterstained with DAPI and the slides were mounted in 
Vectashield mounting medium (Vector Laboratories Inc, 
Burlingame, CA) . The signal was visualized by digital 
imaging' microscopy using a cooled charge-coupled device 
camera (Photometries Ltd, Tucson, AZ) . Merging and 

30 pseudocoloring were performed using the Smart Capture 
software (Vysis, Stuttgart, Germany) . 

1.2.4 Northern blotting 

The membranes for the Northern blots were 
35 obtained from Clontech. Hybridization was performed for 
two hours at 68°C # using Expresshyb solution (Clontech) 
according to the manufacturer's specifications. The probe 
was either a 32 P-oligolabeled BamHI-Xbal fragment from the 
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I .M. A.G. E. -clone 427858 (corresponding to residues 2147- 
2488 of the GPC6 sequence) or a Hindlll-BamHI fragment 
from the GPC6 composite cDNA sequence (corresponding to 
residues 1724-2147 of the GPC6 sequence) , Dehybridisation 
5 included two washes with 2.0% SSC, 0.05% SDS (5 min at 
RT; 30 min at 60°C) and a high stringency wash with 0.1% 
SSC, 0.1% SDS (30 min at 65<>C) . 

1.2.5 Construction of expression plasmids and cell 

10 transfection 

The Notl-BstEII and BstEII-Aval fragments from 
overlapping RACE-clones, and the Aval-Hindlll fragment of 
I .M. A.G. E. -clone 427858 were ligated together in pCR2.1. 
Notl-Xbal and Xhol-Xbal fragments from this construct, 

15 containing the Kozak sequence and initiator ATG, the full 
coding sequence and the stop codon, were subcloned in, 
respectively, pcDNAIII and pB lues crip t . For episomal 
expression in Namalwa cells, the full length cDNA was 
released from pBluescript with Kpnl and NotI, and ligated 

20 into pREP4, yielding the plasmid glyp6-pREP4. 

Namalwa cells (ATCC CRL 1432) were routinely 
grown in DMEF12 medium supplemented with 10% FCS and L- 
glutamine. For transfection, the cells were prewashed 
with Ca- and Mg-free PBS and incubated for 10 min at 4°C 

25 (10 7 cells in 1 ml Ca/Mg free PBS) with 3 0 /xg glyp6-pREP4 
plasmid before electroporation at 240 V and 960 fiF (Gene 
Pulser, Biorad) . 

Selection was started 48 h later with 250 ptg/ml 
of hygromycin B. Stable transfection was achieved after 

3 0 12 days. Expression of heparan sulfate in the 

transfectants was analyzed by FACS, using the HS-specific 
antibody 10E4 and the delta-HS-specif ic antibody 3G10, 
and by Western blotting, using the 3G10 antibody as 
described before (David et al., 1992; Steinfeld et al . , 

35 1996) . Stable expressing transfectant Namalwa clones were 
isolated from cells transfected with linearised glyp6- 
pcDKAIII, selected in media containing 400 /zg/ml of G418, 
and panned on 10E4 antibody. (The HS-specific antibodies 
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10E4 and 3G10 have been isolated, characterized and 
produced in our own laboratory (David et al . , 1992) . They 
are now also commercially available from Seikagaku Co.) 

5 1.3 Results 

1.3.1 Molecular cloning of glypican-6 ( GPC6 ) , a novel 
glypican 

During the screening of public EST databases, 
it was noticed that there was one EST from human fetal 

10 heart (GenBank Acc. No. N87558; clone not available) and 
one EST from fetal liver/ spleen (GenBank Acc. No. 
AA001322; available as clone 427858 from the I.M.A.G.E- 
consortium) with very high homology (but not identical) 
to human GPC1 and GPC4 (approximately 70% identity at the 

15 nucleotide- and encoded amino acid-sequence level) . It 
was assumed that these might represent (a) novel 
glypican (s), and primers (annealing to regions with 
significant sequence divergence from GPC1 and GPC4 ) were 
designed which would amplify the corresponding cDNA(s) 

20 (see Table 1) . These primers were used on the same human 
fetal brain cDNA library that was used for the isolation 
of GPC5 (Veugelers et al . , 1997) and human GPC4 (example 
2) , in RACE experiments. 

From the analysis of their sequences, it 

25 appeared that all clones from these RACE experiments and 
the I .M.A.G.E , -clone 427858 represented overlapping 
cDNAs . 

Figure 1 represents the merged sequences of 
these clones, and the predicted structure of the protein 

30 encoded by the message that corresponds to this cDNA. The 
sequence features an ATG start codon, in a Ko2ak sequence 
context, at position 586 and a TAA stop codon at position 
2251 . Two AATAAA sequences (potential polyadenylation 
signals) are present at positions 2598 and 2690. The open 

35 reading frame in the sequence codes for a protein of 55 5 
residues. The protein sequence starts and terminates with 
hydrophobic signal peptide-like sequences. It contains no 
asparagines that correspond to potential N-glycosylation 
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sites, and contains four serine-glycine dipeptide 
sequences* All four Ser-Gly dipeptide sequences occur 
towards the C- terminus of the protein, and form part of a 
direct Ser-Gly repeat sequence. This Ser-Gly tetra-repeat 
5 sequence is flanked, both upstream and downstream, by 
acidic amino acids (D/E) , reproducing a motif that has 
been reported to promote the assembly of heparan sulfate 
in proteoglycans. The downstream acidic residues occur 
within the sequence CMDDVC, and may reproduce a motif (a 
10 small acidic loop supported by a disulfide bond) that is 
shared by most glypicans (except glypican-2) . This loop 
follows the SG repeats in the glypicans 1, 4 and 6, but 
interrupts or precedes the SG repeats in the glypicans 3 
and 5* 

15 Alignment of this predicted protein sequence 

with the protein sequences of the other known members of 
the glypican family (glypicans 1, 3, 4, and 5 as 
identified in man, and glypican-2 as identified in rat) 
revealed significant sequence similarities (figure 6) . 

20 This similarity included the 14 cysteines and the 

position and identity of several additional amino acid 
residues that are conserved in all glypicans identified 
so far. The entire protein showed 63% of sequence 
identity to human glypican-4, 44% of identity to human 

25 glypican- 1, and 24-25% identity to the human glypicans 3 
and 5- Comparison with rat glypican-2 showed only 41% of 
identity (Table 2) . Since (where both available) human 
and rodent glypican sequences had always proven highly 
similar <~90% of sequence identity) , it seemed unlikely 

30 that this protein represented the human homologue of 
cerebroglycan (glypican-2) . This protein was therefore 
designated as glypican- 6, the sixth member of the 
vertebrate glypican family. 

This alignment further indicated that the 

35 glypicans 4 and 6 were more closely related to one 

another (63% identity) than to the other glypicans (only 
20 - 40% identity) , and that the glypicans 3 and 5 were 
more closely related to one another (43% identity) than 
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to the other glypicans (-20% identity) (see Table 2) . The 
high similarity of glypican-4 and glypican-6 became even 
more striking when the N- terminal and C- terminal 
hydrophic signal peptide -like sequences (absent from the 
5 mature proteins) were excluded in the alignments 

{identity raising to 70%) . These data suggest that the 
glypican family of cell surface heparan sulfate 
proteoglycans (as known today) may be composed of 
discrete subfamilies: one comprising glypicans 4 and 6, 
10 and possibly also glypican -1 (and 2) ; the second 
comprising glypicans 3 and 5. 

1.3.2 Expression of glypican-6 

In Northern blotting experiments (figures 7A, 

15 7B and 7C) , two different GPC6 probes (one from the 3 1 
UTR, corresponding to residues 2147-2488 of the sequence 
shown in figure 1, the other corresponding to residues 
1724-2147 of the sequence shown in figure 1) detected a 
transcript of ~7kb in (all) fetal tissues (analyzed 

20 here) . High levels of expression were apparent in fetal 
kidney , moderate levels in fetal lung and fetal liver, 
and a low level of expression in fetal brain (figure 7A) . 
In adult tissues the message appears to be expressed 
almost ubiquitously. The message is expressed at very 

25 high levels in ovary, and at high levels in liver, 

kidney, small intestine and colon (mucosal lining) . The 
message is also present at low levels in heart, brain, 
placenta, lung, skeletal muscle, pancreas, spleen, 
thymus, prostate and testis (figures 7B and C) . The 

30 message is undetectable in peripheral blood leukocytes. 
In adult kidney the probes also detected a second less 
abundant message of approximately 5.8 kb. Adult heart and 
adult skeletal muscle yielded an extra band of -3.9 kb. 

These data indicated differential and only 

35 partially overlapping expressions of the GPC6 and other 
GPC messages in the different tissues, further evidence 
that the various GPC s are distinctive transcripts. 
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1.3.3 Identification of glypican-6 as a heparan sulfate 
proteoglycan 

To test if glypican-6 would drive the synthesis 
of heparan sulfate, the glypican-6 insert was subcloned 
5 in the pREP4 episomal expression vector and transfected 
in Namalwa cells. The Namalwa cells used for these 
experiments had previously been shown to express little 
endogenous heparan sulfate , but to support the synthesis 
of large amounts of heparan sulfate when transfected with 

10 cDNAs (cloned in pREP4) that code for syndecans or 
glypican-1. Analysis of heparitinase digested 
proteoglycan from transfectant and control cells by 
Western blotting, using the delta-HS-specif ic mAb 3G10, 
confirmed that the transfectant cells expressed heparan 

15 sulfate proteoglycan core proteins of -65 kDa (major 
band) and -18-14 kDa (minor bands, possibly metabolic 
degradation products) that were not detectable in the 
control cells (figure 8) . Major bands of -65 kDa and 
minor bands of smaller sizes were also observed for 

20 transfectant Namalwa cells expressing glypican-1 or 

glypican-4 (figure 8) and glypican-3 or glypican-5 (not 
shown) . FACS analyses of these cells with the HS -specific 
mAb 10E4 and, after heparitinase, with mAb 3G10 revealed 
a dramatic increase in the expression of cell surface -HS 

25 in the transfectant s (figure 9) . 

1.3.4 Chromosomal mapping of GPC6 

Two BACs for GPC6 , 114A17 and 182F5 were used 
to localize GPC6 to chromosome band 13q32 by fluorescent 

30 in situ hybridization on metaphase chromosomes (figure 
10; BAC 114A17) . From this it follows that GPC5 (closely 
related to GPC3 1 and GPC6 (closely related to GPC4) map. 
in close proximity of one another on 13q32, mimicking the 
clustering of the GPC3 and GPC4 genes on chromosome Xq26 

35 (see example 2) . 
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EXAMPLE 2 

Identification and characterization of human crlvpican-4 
2.1 Introduction 

The isolation of the cDNA that is used in the 
5 present invention for diagnostics (GPC4) started from a 
partial cDNA for human glypican-4 . A cDNA comprising the 
complete coding sequences for human glypican-4 was found 
in a cDNA library of fetal brain. 

10 2 . 2 Materials and methods 

2.2.1 Bioinformatics 

EST entries (including homology data) were 
retrieved from dbEST using either a text string based 
query interface 
15 (http://www.ncbi.nlm.nih.gov/dbEST/index.html) , or by 
BLAST searches using the BLAST -server 

(http://www.ncbi.nlm.nih.gov/BLAST/) . Protein alignments 
were made using the program Clustal (Higgins et al - , 
1994} . DNA alignments were made using the program 
20 GENE PRO . 

2.2.2 Molecular cloning of human GPC4 

A partial cDNA for human GPC4 was obtained by 
PCR on a human fetal kidney library (pKGP-PCR) . The 

25 sequence of this cDHA was used to design the primers for 
the RACE experiments and the isolation of cDNA for the 
complete coding sequence of human GPC4 . The 5 ' -RACE and 
3 ' -RACE experiments were performed on a library of 
adaptor- ligated ds fetal brain cDNA, using the Marathon 

30 cDNA Amplification kit from Clontech (Palo Alto, CA) . The 
cDNAs were amplified through a two-step PCR protocol. The 
first PCR used a gene-specific primer (Table 3) and an . 
anchor primer provided by the supplier. Then 1 fil of the 
first PCR reaction was used as template for the second 

35 PCR- react ion, using a second gene- specif ic nested primer 
and a nested anchor primer provided by the supplier. The 
products of the second PCR were analyzed by 
electrophoresis in a 0.6% agarose gel. Distinct bands 
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were gel purified using the Qiaquick DNA clean-up system 
(Qiagen) , T/A-cloned in the plasmid pCR2.1 (Invitrogen) , 
and sequenced using a Pharmacia A.L.F. DNA Sequencer, 
with Dye Primer Cycle Sequencing chemistry on double 
5 stranded plasmid templates. In total 3 independent RACE 
clones were sequenced for the 5 ' -RACE and 3 independent 
RACE clones were sequenced for the 3 • -RACE . 

Additionally, clone zxl2dl2 from the Soares • 9 
week normal fetus cDNA library (Lennon et al - , 1996) was 
10 obtained from the I.M.A.G.E. Consortium 

(http : / /www . bio . llnl . gov/bbrp/ image/ image . html ) trough 
Research Genetics, Inc. {Huntsville, AL) . This clone (ID: 
786263) was completely sequenced, yielding residues 1443- 
2315 of the composite cDNA sequence shown in figure 2. 
15 All the sequences obtained for the coding region 
(residues 213-1883) were derived from at least two 
different RACE-products . 

2.2.3 BAC cloning and chromosomal localization of GPC4 by 
20 FISH 

To isolate genomic clones for GPC4 . the pKGP- 
PCR probe (corresponding to residues 422-14 97 of the GPC4 
cDNA sequence shown in figure 2) and the Notl-Bglll 
fragment (residues 1-3 86) of the GPC4 cDNA were gel 

25 purified, 32 P- labelled and used to screen a human genomic 
BAC library (Research Genetics, Inc . , Huntsville, AL) . Two 
BACs, 35H9 and 151D8 were isolated with the PCR probe, 
and one BAC, 68G14, with the Notl-Bglll fragment. The 
authenticity of these clones was verified by Southern 

3 0 blotting and by cycle -sequencing of the exon-intron 

boundaries, using gene-specific primers derived from the 
GPC4 cDNA sequence {Table 4). BAC DNA was labeled with 
bio-16-dUTP (Sigma) by nick- translation, using a 
commercial kit (Life Technologies, Gaithersburg, MD) . 

35 A similar strategy was used to isolate BACs for 

GPC3. Using cDNAs corresponding to residues 1-23 00 and l- 
408 of the GPC3 sequence (Genbank Access N° Z37987) , two 
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BACs were identified: BAC 166D10, which contained exon-3 
of GPC3 . and BAC 36D20 which contained exon-2 of GPC3 . 

Metaphase spreads were prepared from PHA- 
stimulated human peripheral blood lymphocytes cultured 
5 for 72 hours. Prior to FISH, slides were treated with 
RNAse A and pepsin as described (Wiegant et al . , 1991) . 
Human Cotl DNA (Life Technologies) was used as a 
competitor, Denaturation of the slides and probes t 
hybridization, and subsequent cytochemical detection of 

10 the hybridization signals were performed as previously 
described (Vermeesch et al., 1995). Chromosomes were 
counterstained with DAPI and the slides were mounted in 
Vectashield mounting medium (Vector Laboratories Inc, 
Burlingame, CA) . The signal was visualized by digital 

15 imaging microscopy using a cooled charge -coupled device 
camera (Photometries Ltd, Tucson, AZ) . Merging and 
pseudocolor ing were performed using the Smart Capture 
software (Vysis, Stuttgart, Germany) . 

20 2.2.4 GPC4 gene structure and BAC/PAC contig of the 
GPC3 /GPC4 gene cluster on Xq26 

Exon-intron boundaries were determined by 
cycle - sequencing . of BAC DNA using gene specific primers. 
Alternatively, BAC DNA was subcloned in plasmids, 
25 verified for the presence of GPC4 exons (by PCR and 
Southern blotting) and subsequently sequenced. 

YAC's yWXD363, yWXD2789-I, yWXD440, yWXD736, 
yWXD69, yWXD808, yWXD6857~I, yWXD6858-I, yWXD3373, 
yWXD2704-I, yVJXD6142, yWXD2724-l from the Xq26 contig 
30 (Pilia et al . , 1996) were obtained from the American Type 
Culture Collection (ATCC) and verified for GPC4 content 
by PCR and Southern blotting. 

The ends of BACs 35H9 and 68G14 were sequenced 
and used to construct the novel sequence-tags (STS) MV1, 
35 MV2 and MV3 (Table 5) . 

The PAC- library from P. de Jong was screened 
with 32 P-oligolabeled probes for MV1, exon-1 GPC4 . exon-2 
GPC4 and exon-8 GPC3 . PAC content was verified by PCR. 
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STS * s for GPC4 exons are described in Table 5, STS • s 
sWXD1698, SWXD1165 and SWXD2342 have been described by 
others (see The Genome Database http://www.gdb.org) .The 
reaction cycles for STS ' s MVl, MV2 and MV3 were: 94 °C for 
5 30 sec, 55°C for 30 sec, 72° for 30 sec, for 35 cycles. 
Cycling was preceded by a 150 sec incubation at 94°C. 

2.2.5 Northern blotting 

The membranes for the Northern blots were 
10 obtained from Clontech. Hybridization was performed for 
two hours at 68 °C, using Expresshyb solution (Clontech) 
according to the manufacturer's specifications. The probe 
was either a 32 P-oligolabeled Notl-Bglll fragment from one 
of the 5 ' RACE clones (corresponding to residues 1-386 of 
15 the sequence shown in figure 2) , or a 32 P-oligolabeled 
BamHI-BamHI fragment from the composite cDNA sequence 
constructed in pREP4 (corresponding to residues 114 8-2291 
of the sequence shown in figure 2) . Dehybridisation 
included washing at room temperature for 30 min with 2.0% 
20 SSC, 0.05% SDS and a high stringency wash for 30 min at 
0.1% SSC, 0.1% SDS and 65°C. 

2.2.6 Mutational analysis of the GPC4 gene 

Prom the characterization of the corresponding 
25 intron/exon boundaries in GPC4 , primer pairs were 

designed for the amplification of all exons of the human 
GPC4 gene, to permit deletion and mutational analysis 
(Table 8) . For GPC3 deletion analysis, we designed new 
primers for the amplification of all exons (Table 8) . 
30 Genomic DNA was obtained from one newly identified 

patient, counseled at the Center for Human Genetics (CME) 
of the University of Leuven (with informed consent from 
the parents) ; from the lymphoblastoid cell lines AG0817, 
AG0857, AG0893, AG0946, AG0969, and FY0367 (database IDs) 
35 from the European Collection of Cell Cultures (ECACC) ; 
and from the fibroblastic cell lines GM13034, GM3884, 
GM0097 (ATCC) , all established from patients with SGBS. 
All patient DNAs were analyzed by PCR. The reaction 
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cycles were: 94 °C for 30 sec, annealing temperature for 
30 sec, 72°C for 30 sec, for 35 cycles. Cycling was 
preceded by a 2.5 minute incubation at 94°. The reaction 
products were analyzed by electrophoresis in 2% agarose 
5 gels or, alternatively, were analyzed for single- strand 
conformation polymorphisms (SSCP) in non -denaturing 
polyacryl amide gels as described previously {Matthijs et 
al., 1997) . PCR products with variant SSCs and controls 
were sequenced, either directly after gel purification or 
10 after T/A cloning in pCR2 . l (Invitrogen) . In the latter 
case, several independent clones from independent 
amplifications were characterized by Dye Primer Cycle 
Sequencing. 

15 2.2.7 Construction of expression plasmids and cell 
transf ection 

To construct a partial GPC4 cDNA, containing 
the entire coding region, a Notl-EcoRI fragment from a 5' 
RACE clone, a EcoRI-PstI fragment from pKGP, and a Pstl- 

20 BamHI fragment from a 3 1 -RACE clone were ligated together 
in pBluescript. A Notl-NotI fragment containing GPC4 was 
isolated from this construct and ligated in pCDNAIII and 
pREP4, yielding respectively glyp4-pcDNAIII and glyp4- 
pREP4. Namalwa cells (ATCC CRL 1432) were routinely grown 

25 in DMEF12 medium supplemented with 10% FCS and 

L- glut amine. For transf ection, the cells were prewashed 
with Ca- and Mg-free PBS and incubated for 10 min at 4°C 
<10 7 cells in 1 ml Ca/Mg free PBS) with 3 0 /xg glyp4-pREP4 
plasmid before electroporation at 240 V and 96 0 fiF (Gene 

30 Pulser, Biorad) . Selection was started 48 h later with 
250 fig/ml of hygromycin B. Stable transf ection was 
achieved after 12 days. Expression of heparan sulfate in 
the transfectants was analyzed by FACS, using the HS- 
specif ic antibody 10E4 and. the delta-HS-specif ic antibody 

35 3G10, and by Western blotting, using the 3G10 antibody as 
described before (David et al . , 1992; Steinfeld et al . , 
1996) . Stable expressing transf ectant Namalwa clones were 
isolated frotn cells transf ected with linearized glyp4- 
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pcDNAIII, selected in media containing 400 pig/ml of G418, 
and panned on 10E4 antibody. 

2.3 Results 

5 2.3.1 Molecular cloning of human glypican-4 ( GPC4 ) 
The combination of 5 1 -RACE and 3 1 -RACE 
experiments, performed on a library of adaptor -ligated ds 
fetal brain cDNA library, yielded the complete coding 
sequence for human GPC4. Figure 2 represents the merged 

10 sequences of the RACE clones, pKGP and EST zxl2dl2 
(identified as GPC4 from BLAST searches) of public 
databases and the predicted structure of the protein 
encoded by the message that corresponds to this cDNA. The 
sequence features ah ATG start codon, preceded by a Kozak 

15 sequence, at position 213 and a TAA stop codon at 
position 1881. One AATAAA sequence (potential 
polyadenylation signal) is present at position 3 697, and 
a stretch of polyA starts at position 3706. The predicted 
amino acid sequence of human GPC4 (556 residues) was 

20 found to be highly homologous to that of mouse GPC4 

(K-glypican) (93.5% sequence identity) (see also Table 
2) . 

The protein sequence starts and terminates with 
hydrophobic signal peptide-like sequences. It contains 

25 three serine -glycine dipeptide sequences. All three Ser- 
Gly dipeptide sequences occur towards the C- terminus of 
the protein, and two of these form part of a direct Ser- 
Gly repeat sequence. These Ser-Gly sequences are flanked, 
both upstream and downstream, by acidic amino acids 

3 0 (D/E) f reproducing a motif that has been reported to 

promote the assembly of heparan sulfate in proteoglycans 
(Zhang et al . , 1995). Because of the presence of three 
Ser-Gly repeats, glypican-4 would be predicted to have up 
to three heparan sulfate chains implanted on its core 

35 protein. The acidic residue downstream of the Ser-Gly 
repeat occurs within the sequence CEYQQC, and may 
reproduce a motif (a small acidic loop supported by a 
disulfide bond) that is shared by most glypicans (except 
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glypican-2) . This loop follows the SG repeats in the 
glypicans -1, -4 and -6, but interrupts or precedes the 
SG repeats in the glypicans -3 and -5. 

5 2.3.2 Expression of human glypican-4 

In Northern blotting experiments, both probes 
corresponding either to residues 1-386, or residues 1148- 
2291 of the GPC4 sequence were detecting two messages, 
one of 2.9 and one of 4.3 kb. The messages were 

10 expressed, in several, but not all, of the human fetal 
and adult tissues tested (see figure 11) . The origin of 
these two bands is not known, but could be due to the 
alternative usage of multiple polyadenylation signals, 
alternative splicing or, less likely, cross-hybridization 

15 with messages for other (possibly yet to be identified) 
members of the glypican gene family- In fetal tissues the 
messages were expressed in brain, kidney and lung; but 
barely detectable in liver* In adult tissues the message 
is highly abundant in skeletal muscle, pancreas, kidney, 

20 placenta, lung, heart, spleen , testis, ovary , colon, 

small intestine. Less intense bands were seen in brain, 
thymus and prostate, and barely detectable bands were 
seen in the liver. The message appears to be absent from 
peripheral blood leukocytes. EST's for human GPC4 were 

25 also present in libraries prepared from a 9 week old 

fetus, pregnant uterus, fetal heart, adult lung, placenta 
and colon. The expression pattern of human GPC4 is almost 
the same as murine K-glypican with the exception of mGPC4 
being abundantly expressed in the liver. 

30 

2.3.3 Identification of glypican-4 as a heparan sulfate 
proteoglycan 

To test if glypican-4 would support the 
synthesis of heparan sulfate, the glypican-4 insert was 
35 subcloned in the pREP4 episomal expression vector and 
transfected in Namalwa cells. The Namalwa cells used for 
these experiments had previously been shown to express 
little endogenous heparan sulfate, but to support the 
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synthesis of large amounts of heparan sulfate when 
transfected with cDNAs (cloned in pREP4) that code for 
syndecans or glypican-1. Analysis of heparitinase- 
digested proteoglycan from transfectant and controls 
5 cells by Western Blotting, using the delta-HS-specif ic 
mAb 3G10, confirmed that the transfectant cells expressed 
heparan sulfate proteoglycan core proteins of -65 kDa 
(major band) and -18-14 kDa (minor bands) that were not 
detectable in the control cells (figure 8) . FACS analyses 
10 of these cells with the HS- specific mAb 10E4 and, after 
heparitinase, with mAb 3G10 revealed a dramatic increase 
in the expression of cell surface-HS in the transf ectants 
figure 9) . 

15 2.3.4 Chromosomal mapping of GPC4 

Three BACs were identified for GPC4 : BAC 
35H9, BAC 151D8, and BAC 68G12 . BACs 3 5H9 and 15108 
contained exons 2 to 9 of GPC4 , while BAC 68G12 contained 
exon-1 of GPC4. FISH, performed on metaphase chromosomes, 

20 localized all BACs for GPC4 to Xq26 (figure 12) . Since 
GPC3 had also been localized to chromosome band Xq26 
(Pilia et al . , 1996), these results suggested that GPC3 
(closely related to GPC5) and GPC4 (closely related to 
GPC6) probably mapped in proximity of one another on 

25 Xq26, mimicking the clustering of the GPC5 and GPC6 genes 
on chromosome 13q32 (see example 1) . The relative 
orientation of GPC4 and GPC3 was determined by FISH on 
cell lines of Simpson-Golabi-Behmel (SGBS) syndrome 
patients with translocations in the GPC3 gene (Table 6 

30 and figure 12) . 

These FISH data indicate that the GPC4 gene 
lies centromeric to GPC3 > Since there is a YAC-contig 
covering Xq26, it was decided to look for YAC's 
containing GPC4 . The following YAC's were tested by 

35 Southern -blotting and PCR for the presence of GPC4 exons : 
VWXD363, yWXD2789-I, yVJXD440, yWXD736, yWXD69, yWXD808, 
YWXD6857-I, yWXD6858-I, yWXD3373, yWXD2704-I, yWXD6142- 
I, yWXD2724-I. 
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Only YAC's yWXD3373 and yWXD6858-I were found 
to be positive for exon- 2 to exon- 9 of GPC4 . No YAC's 
were found positive for exon-l of GPC4 . Moreover , only 
YAC's yWXD6142 and yWXD2704 were positive for exon- 8 of 
5 GPC3 . These data suggested that some YAC's might have 
undergone internal deletions, and lead to the 
construction of a new BAC/PAC contig. Figure 13 shows the 
BAC/PAC contig containing the entire GPC4 gene and 
linking both GPC3 and GPC4 . This contig indicates that 
10 both glypicans form a tandem array with exon-1 and the 
promotor region of GPC4 lying adjacent to the last exon 
of GPC3 . The GPC4 gene exon/intron structure is 
schematically shown in Table 7. 

15 EXAMPLE 3 

Glypican involvement in the Simpson- Golabi-Behmel 
syndrome 

3 . 1 Introduction 

Recently, deletions and translocations 
20 involving the gene for glypican- 3 ( GPC3 ) have been shown 
to occur in patients with the Simpson- Golabi-Behmel 
overgrowth syndrome (Pilia et al . , 1996). Not all 
patients with this X- linked condition, however, are 
affected by mutations of the GPC3 gene that can easily be 
25 demonstrated (Lindsay et al., 1997). GPC4 was mapped by 
the present inventors on Xq26, in close proximity to 
GPC3 , in an interval such that it would be deleted in at 
least one family with SGBS (Pilia et al, family c and 
figure 13C) . Therefore, the possibility was investigated 
30 of Xq mutations in patients with SGBS that affect GPC4 , 
in addition to or rather than GPC3 . The results show that 
in some patients this is indeed the case. 

3.2 Materials and methods 

35 From the characterization of the corresponding 

intron/exon boundaries in GPC4 (analysis of the BACs 
described in example 2) and GPC3 new primers were 
designed for the amplification of all exons . Genomic DNA 



WO 99/37764 



PCT/EP99/00329 



30 

was obtained from one newly identified patient, counseled 
at the Center for Human Genetics (CME) of the University 
of Leuven (with informed consent from the parents) ; from 
the lymphoblastoid cell lines AG0817, AG0857, AGO 8 93, 
5 AG0946, AG0969, and FY0367 (database IDs) from the 

European Collection of Cell Cultures (ECACC) ; and from 
the fibroblastic cell lines GM13034, GM3884, GM0097 
(ATCC) , all established from patients with SGBS. All 
patient DNAs were analyzed by PCR. The reaction cycles 

10 were: 94 °C for 30 sec, annealing temperature for 30 sec, 
72 °C for 30 sec, for 35 cycles. Cycling was preceded by a 
2.5 minute incubation at 94°. Primers and annealing 
temperatures are given in Table 8 . The reaction products 
were analyzed by electrophoresis in 2% agarose gels. PCR 

15 primer pairs were designed for amplification of all exons 
of GPC4 (including exon/intron boundaries, Table 11) and 
the corresponding PCR products were analyzed for single - 
strand conformation polymorphisms (SSCP) in non- 
denaturing poly aery 1 amide gels as described previously 

20 (Matthijs et al . , 1997). PCR products with variant SSCs 
and controls were either directly sequenced after gel 
purification or T/A cloned in pCR2.1 (Invitrogen) . 
Several independent clones from independent 
amplifications were characterized by Dye Primer Cycle 

25 Sequencing . 

3.3 Results 

A summary of the PCR and SSC analyses is given 
in Table 10. These analyses identified one patient with a 

30 deletion that involved the entire GPC4 gene and part of 
th e GPC3 gene (exons 7 and 8} . No other GPC4 deletions 
were detected, but a partial deletion of GPC3 (exons 1 
and 2) was also identified in the patient diagnosed at 
CME. SSCA of the GPC3 exons revealed polymorphism for 

35 exon 3. Two of the patients with a variant exon 3 were 
brothers, and sequencing of the corresponding PCR 
products identified a T>A mutation leading to a missense 
mutation of R for W 296 in glypican-3 ( (a) in Table 10) . 
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Interestingly, W 296 corresponds to one of the residues that 
are strictly conserved in all glypicans identified so 
far. Deletion of one T nucleotide (del T875) , leading to 
a frame shift mutation and termination, was the basis for 
5 the variant SSC of exon 3 in a third patient (b) . SSCA of 
the GPC4 exons revealed polymorphisms for the exons 7 and 
8, in one and the same patient. Sequencing of the PCR 
product of exon-7 in this patient identified a G>T 
mutation leading to a substitution of D 391 by S in 

10 glypican-4 (c) . Sequencing of the PCR product of exon 8 
in this patient and controls identified a C>T 
substitution leading to a substitution of V by A as 
residue 442 in glypican-4 (d) . It may be noted that the RACE 
experiments also yielded V as residue 442 and D as 

15 residue 391 (see figure 2) , and that these residues have 
not been conserved in the glypicans , Moreover the IMAGE 
Consortium cDNA clone for GPC4 also had a V at position 
442, and the plasmid pKGP had an E at position 391. 
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Table 1 

Primers used in the 5 1 -RACE -experiments for GPC6 



5 ' - ATTCCACTCTGTGTCGAGGTCAGCCTGA- 3 ' {1425-1452} : first 
primer (1) 

5» -ACAGTATGGGCAGTACAGCATCTTCATG-3 * (13 29-1358) : nested 
primer (l) 

5' -CCTGAGCCACTGGATTCATCACTTG- 3 ' (2048-2072) : first primer 
(2) 

5 ' - GCCATAATCXGCTGTCTGATGAAAGTG - 3 ' (1956-1982) : nested 
primer (2) 

(The numbers in parenthesis refer to the corresponding 
residues in the composite cDNA sequence shown in 
figure 1) . 



Table 2 

Percent amino acid identities between the glypicans 
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rGPC2 
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J GPC4 
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24 
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25 
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Table 3 

Primers used in the RACE -experiments for GPC4 



5 ' -CGAGCCCAGAAGTCATTTAGCATTTCTTCC-3 ' : 5 ' -RACE GPC4 

5 ■ -CACAAACATATCATTCAGGGATTTCTCTGC-3 • : nested 5 1 -RACE GPC4 

5 ' -ATTTGAAGATCTGTCCCCAGGGTTCTAC-3 ' : 3 ■ -RACE GPC4 

5 1 -AGTGTGGTCAGCGAACAGTGCAATC-3 * : nested 3 ' -RACE GPC4 

5 1 - CCAACTGTGATCTCGCCTTGTTTCT - 3 ' : nested 3' -RACE GPC4 
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Table 5 



5 * -TCATGACTAGTTTCTTGCACGG-3 ' : STS MVla 
5 * -TGAAAATCCACATGATTGGAAA-3 1 ; STS MVlb 
5 ' - AAGCTTGAAGGGTGCTCAGA- 3 ' : STS MV2a 
5 • -ATTTCCTGCTGCTGGTCACT-3 ' : STS MV2b 
5 ' - TCTCCTTTCCCTGGACTAACC - 3 ' : STS MV3a 
5 ' -TGAGTCAAATTAAAGAGCAAGGC-3 1 : STS MV3b 
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Table 10 

Patient ID 
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CIAIMS 

1. Polynucleotide encoding a glypican-related 
protein, identified herein as "glypican-6" and comprising 
at least the coding sequence of the nucleotide sequence 
as depicted in figure 1 . 
5 2. Glypican-6 gene (GPC6) in isolated form 

comprising at least as the coding sequence the nucleotide 
sequence as depicted in figure l optionally interrupted 
by one or more introns, and optionally operably linked to 
transcription and translation regulatory sequences, 
10 3 . Polynucleotide encoding a glypican-related 

protein, identified herein as "glypican-4" and comprising 
at least the coding sequence of the nucleotide sequence 
as depicted in figure 2 . 

4. Glypican-4 gene (GPC4) in isolated form 

15 comprising at least as the coding sequence the nucleotide 
sequence as depicted in figure 2 optionally interrupted 
by one or more introns, and optionally operably linked to 
transcription and translation regulatory sequences. 

5 . Derivatives of the polynucleotide sequence 
20 as depicted in figures 1 or 2, which derivatives are 

selected from fragments of the gene as claimed in claim 2 
or 4, either isolated or synthetic and having a length 
that is smaller than the complete gene; primers, 
comprising at least 10 consecutive gene specific 

25 nucleotides, preferably about 20 gene specific 

consecutive nucleotides of the nucleotide sequence of the 
gene; longer oligonucleotides up to the full length of 
the gene; antisense variants of the gene, the fragments 
or the primers; antibodies directed to the gene, 

30 fragments, primers or complementary strands thereof; any 
specific ligand for DNA that can be used as a specific 
probe, peptide nucleic acid probes. 

6. Derivatives as claimed in claim 5, which 
derivatives are selected from transcripts (mRNA 

35 sequences) of the gene, cDNA, antisense RNA, antisense 
cDNA, antibodies directed to the transcript, sense and 
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antisense cDNA, antisense RNA and any specific ligand for 
RNA that can be used as a specific probe. 

7. Derivatives as claimed in claim 5, which 
derivatives comprise at least part of the amino acid 

5 sequence encoded by the coding sequence of the nucleotide 
sequence depicted in figure 1 or 2 and selected from the 
isolated or synthetic gene product (protein or 
polypeptide) ; isolated or synthetic peptides, comprising 
a specific sequence of consecutive amino acids encoded by 
10 the gene, antibodies directed to the gene product or 

peptides and any specific ligand for peptides that can be 
used as a specific probe . 

8 . Polynucleotides of claim 1 and 3 , for use in 
diagnosis and/or therapy. 

15 9. Gene of claim 2 and 4 and/or derivatives of 

claims 5-7 for use in diagnosis and/or therapy. 

10. Method for diagnosing aberrations in a 
glypican encoding gene, comprising isolation of the gene 
from cells expected to be harboring an aberrant gene; and 

20 comparing the nucleotide sequence of the gene thus 
obtained with the nucleotide sequence of a wild type 
gene. 

11. Method as claimed in claim 10, wherein 
comparing the nucleotide sequence of the gene to be 

25 diagnosed with the nucleotide sequence of a wildtype gene 
is performed by restriction fragment length polymorphism 
screening, comprising separately digesting the isolated 
gene and a wild type comparison gene with one or more 
selected restriction enzymes, separating the digest thus 

3 0 obtained on a gel to reveal a pattern of bands, and 
comparing the patterns of the isolated gene and the 
wildtype gene. 

12. Method as claimed in claim 10, wherein 
comparing the nucleotide sequence of the gene to be 

35 diagnosed with the nucleotide sequence of a wildtype gene 
is performed by means of Polymerase Chain Reaction (PGR) , 
between probes corresponding to various parts of the gene 
to be diagnosed, for example exons, separating the 



WO 99/37764 



43 



PO7EP99/00329 



reaction mixture thus obtained on a gel to reveal a 
pattern of bands, and comparing the patterns of the 
isolated gene and the wildtype gene. 

13. Method as claimed in claim 10, wherein 
5 comparing the nucleotide sequence of the gene to be 

diagnosed with the nucleotide sequence of a wildtype gene 
is performed by single -strand conformation polymorphism 
screening . 

14. Method as claimed in claim 10, wherein 
10 comparing the nucleotide sequence of the gene to be 

diagnosed with the nucleotide sequence of a wildtype gene 
is performed by DNA sequencing. 

15 . Method for the in situ detection of 
physical changes in a glypican gene, like translocations, 

15 inversions or deletions, by the in situ hybridization of 
labeled probes with a set of chromosomes. 

16. Method as claimed in claim 15, wherein 
translocations can be detected by hybridizing a set of 
chromosomes with a first probe that hybridizes to a part 

20 of the glypican ' gene that is not likely to be involved in 
the translocation, and a second probe that hybridizes to 
a part of the glypican gene that is likely to be involved 
in such aberration, wherein translocations are identified 
when the second probe is detected on another chromosome 

25 than the first probe. 

17. Method as claimed in claim 16, further 
comprising identification of the translocation partner, 
and using in addition probes hybridizing to the 
translocated part and the remaining part of the 

30 translocation partner and bearing a different label from 
the first set of probes. 

18. Method as claimed in claim 15, wherein an. 
inversion is identified if the second probe is found 
closer to or further away from the first probe than in a 

35 non- aberrant chromosome. 

19. Method as claimed in claim 15, wherein 
deletions are detected when one or both of the probes are 
not present on the aberrant chromosome . 
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20. Method as claimed in claims 15-19, wherein 
the gene to be diagnosed is the glypican 3 gene and the 
probes are as given in Table 10. 

21. Method as claimed in claims 15-19, wherein 
5 the gene to be diagnosed is the glypican 4 gene and the 

probes are as given in Table 10 and/or 11. 

22. Method as claimed in claims 15-19, wherein 
the gene to be diagnosed is the glypican 5 gene and the 
probes are as given in Table 13. 

10 23. Method as claimed in claims 15-19, wherein 

the gene to be diagnosed is the glypican 6 gene and the 
probes are as given in Table 14 . 

24. Method as claimed in claims 15-19, wherein 
the gene to be diagnosed is the glypican 1 gene and the 

15 probes are derivable from figure 3 . 

25. Method for diagnosing the expression 
pattern of glypican genes, wherein antibodies directed to 
the gene product or protein are reacted with Western 
blots of cell extracts to detect the presence of the 

20 protein in the cell or to assess the amount of protein 
present . 
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